AITopics

Country:

North America > Canada > Quebec > Montreal (0.05)
Asia > China > Shanghai > Shanghai (0.05)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

Neural Information Processing SystemsFeb-11-2026, 22:06:04 GMT

Evaluating Protein Transfer Learning with TAPE

Roshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Peter Chen, John Canny, Pieter Abbeel, Yun Song

Machine learning applied to protein sequences is an increasingly popular areaof research.

bioinformatics, machine learning, natural language, (20 more...)

Country:

North America > United States > Louisiana > Orleans Parish > New Orleans (0.04)
North America > Canada > British Columbia > Metro Vancouver Regional District > Vancouver (0.04)
Europe > United Kingdom > England > Oxfordshire > Oxford (0.04)
Europe > France (0.04)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Biomedical Informatics (0.94)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.71)

arXiv.org Artificial IntelligenceDec-10-2025

Protein Secondary Structure Prediction Using Transformers

Maxime, Manzi Kevin

Predicting protein secondary structures such as alpha helices, beta sheets, and coils from amino acid sequences is essential for understanding protein function. This work presents a transformer-based model that applies attention mechanisms to protein sequence data to predict structural motifs. A sliding-window data augmentation technique is used on the CB513 dataset to expand the training samples. The transformer shows strong ability to generalize across variable-length sequences while effectively capturing both local and long-range residue interactions.

machine learning, natural language, prediction, (15 more...)

2512.08613

Country: Africa > Rwanda (0.14)

Genre: Research Report (0.50)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Natural Language (0.94)

arXiv.org Artificial IntelligenceNov-18-2025

Protein Secondary Structure Prediction Using 3D Graphs and Relation-Aware Message Passing Transformers

Varshney, Disha, Garg, Samarth, Tyagi, Sarthak, Varshney, Deeksha, Deep, Nayan, Ekbal, Asif

In this study, we tackle the challenging task of predicting secondary structures from protein primary sequences, a pivotal initial stride towards predicting tertiary structures, while yielding crucial insights into protein activity, relationships, and functions. Existing methods often utilize extensive sets of unlabeled amino acid sequences. However, these approaches neither explicitly capture nor harness the accessible protein 3D structural data, which is recognized as a decisive factor in dictating protein functions. To address this, we utilize protein residue graphs and introduce various forms of sequential or structural connections to capture enhanced spatial information. We adeptly combine Graph Neural Networks (GNNs) and Language Models (LMs), specifically utilizing a pre-trained transformer-based protein language model to encode amino acid sequences and employing message-passing mechanisms like GCN and R-GCN to capture geometric characteristics of protein structures. Employing convolution within a specific node's nearby region, including relations, we stack multiple con-volutional layers to efficiently learn combined insights from the protein's spatial graph, revealing intricate interconnections and dependencies in its structural To assess our model's performance, we employed the training dataset provided by NetSurfP-2.0, which outlines secondary structure in 3-and 8-states. Extensive experiments show that our proposed model, SSRGNet surpasses the baseline on f1-scores. Introduction Proteins serve as essential components within cells and are involved in various applications, spanning from therapeutics to materials. They are composed of a sequence of amino acids that fold into distinct shapes. With the development of affordable sequencing technologies [1, 2], a substantial number of novel protein sequences have been identified in recent times. However, annotating the functional properties of a newly discovered protein sequence is still a laborious and expensive process. Thus, there is a need for reliable and efficient computational methods to accurately predict and assign functions to proteins, thereby bridging the gap between sequence information and functional knowledge. The analysis of protein structure, particularly the tertiary structure, is highly significant for practical applications related to proteins, such as understanding their functions and designing drugs [3].

machine learning, natural language, prediction, (18 more...)

2511.13685

Country: Asia > India (0.46)

Genre: Research Report > New Finding (0.34)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Education > Health & Safety > School Nutrition (0.80)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
(3 more...)

Neural Information Processing SystemsOct-2-2025, 13:08:07 GMT

Evaluating Protein Transfer Learning with TAPE

Roshan Rao, Nicholas Bhattacharya, Neil Thomas, Yan Duan, Peter Chen, John Canny, Pieter Abbeel, Yun Song

Neural Information Processing Systems http://nips.cc/

artificial intelligence, machine learning, protein, (16 more...)

Country:

North America > United States (1.00)
Europe > United Kingdom > England (0.28)

Genre: Research Report (0.93)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area (0.94)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

Neural Information Processing SystemsAug-22-2025, 01:14:12 GMT

Probabilistic Transformer: Modelling Ambiguities and Distributions for RNA Folding and Molecule Design Jörg K.H. Franke

Sometimes, the process itself is ambiguous, such as in the case of RNA folding, where the same nucleotide sequence can fold into different structures.

artificial intelligence, machine learning, natural language, (16 more...)

Country:

Europe > Germany > Baden-Württemberg > Freiburg (0.04)
Europe > Austria > Vienna (0.04)

Genre: Research Report (0.46)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area > Immunology (0.93)
Health & Medicine > Therapeutic Area > Neurology (0.68)
Health & Medicine > Therapeutic Area > Infections and Infectious Diseases (0.67)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)
Information Technology > Artificial Intelligence > Vision (0.68)

Neural Information Processing SystemsAug-19-2025, 13:57:54 GMT

PEER: A Comprehensive and Multi-Task Benchmark for Protein Sequence Understanding (Supplementary Material)

For example, the feature of dipeptide " st " is defined by its dipeptide composition ( The Moran feature descriptor defines the distribution of amino acid properties along a protein sequence. It should be noted that there are evident class imbalances in two multi-class classification tasks. Table 1: Balanced metric (weighted F1) compared with accuracy on multi-class classification tasks. We report mean (std) for each experiment. Used as a feature extractor with pre-trained weights frozen.

artificial intelligence, machine learning, prediction, (14 more...)

Country:

North America > Canada > Quebec > Montreal (0.05)
Asia > China > Shanghai > Shanghai (0.05)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning (1.00)

arXiv.org Artificial IntelligenceMay-15-2025

A Comparative Review of RNA Language Models

Wang, He, Zhang, Yikun, Chen, Jie, Zhan, Jian, Zhou, Yaoqi

Given usefulness of protein language models (LMs) in structure and functional inference, RNA LMs have received increased attentions in the last few years. However, these RNA models are often not compared against the same standard. Here, we divided RNA LMs into three classes (pretrained on multiple RNA types (especially noncoding RNAs), specific-purpose RNAs, and LMs that unify RNA with DNA or proteins or both) and compared 13 RNA LMs along with 3 DNA and 1 protein LMs as controls in zero-shot prediction of RNA secondary structure and functional classification. Results shows that the models doing well on secondary structure prediction often perform worse in function classification or vice versa, suggesting that more balanced unsupervised training is needed.

large language model, machine learning, natural language, (15 more...)

2505.09087

Country: Asia > China > Guangdong Province (0.14)

Genre:

Research Report (0.84)
Overview (0.64)

Industry:

Health & Medicine > Pharmaceuticals & Biotechnology (1.00)
Health & Medicine > Therapeutic Area (0.93)

Technology:

Information Technology > Artificial Intelligence > Natural Language > Large Language Model (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceApr-11-2025

PLM-eXplain: Divide and Conquer the Protein Embedding Space

van Eck, Jan, Gogishvili, Dea, Silva, Wilson, Abeln, Sanne

Protein language models (PLMs) have revolutionised computational biology through their ability to generate powerful sequence representations for diverse prediction tasks. However, their black-box nature limits biological interpretation and translation to actionable insights. We present an explainable adapter layer - PLM-eXplain (PLM-X), that bridges this gap by factoring PLM embeddings into two components: an interpretable subspace based on established biochemical features, and a residual subspace that preserves the model's predictive power. Using embeddings from ESM2, our adapter incorporates well-established properties, including secondary structure and hydropathy while maintaining high performance. We demonstrate the effectiveness of our approach across three protein-level classification tasks: prediction of extracellular vesicle association, identification of transmembrane helices, and prediction of aggregation propensity. PLM-X enables biological interpretation of model decisions without sacrificing accuracy, offering a generalisable solution for enhancing PLM interpretability across various downstream applications. This work addresses a critical need in computational biology by providing a bridge between powerful deep learning models and actionable biological insights.

artificial intelligence, machine learning, prediction, (17 more...)

2504.07156

Genre: Research Report > Promising Solution (0.46)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (1.00)

arXiv.org Artificial IntelligenceMar-12-2025

Differentiable Folding for Nearest Neighbor Model Optimization

Krueger, Ryan K., Aviran, Sharon, Mathews, David H., Zuber, Jeffrey, Ward, Max

The Nearest Neighbor model is the $\textit{de facto}$ thermodynamic model of RNA secondary structure formation and is a cornerstone of RNA structure prediction and sequence design. The current functional form (Turner 2004) contains $\approx13,000$ underlying thermodynamic parameters, and fitting these to both experimental and structural data is computationally challenging. Here, we leverage recent advances in $\textit{differentiable folding}$, a method for directly computing gradients of the RNA folding algorithms, to devise an efficient, scalable, and flexible means of parameter optimization that uses known RNA structures and thermodynamic experiments. Our method yields a significantly improved parameter set that outperforms existing baselines on all metrics, including an increase in the average predicted probability of ground-truth sequence-structure pairs for a single RNA family by over 23 orders of magnitude. Our framework provides a path towards drastically improved RNA models, enabling the flexible incorporation of new experimental data, definition of novel loss terms, large training sets, and even treatment as a module in larger deep learning pipelines. We make available a new database, RNAometer, with experimentally-determined stabilities for small RNA model systems.

biochemistry, research support, turner, (13 more...)

2503.09085

Country:

North America > United States > California > Yolo County > Davis (0.14)
Europe > Austria > Vienna (0.05)
Europe > United Kingdom > England (0.04)
(4 more...)

Genre: Research Report (0.64)

Industry: Health & Medicine > Pharmaceuticals & Biotechnology (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks > Deep Learning (0.66)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Gradient Descent (0.46)